Disappearance of Spurious States in Analog Associative Memories 
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We show that symmetric n- mixture states, when they exist, are almost never stable in autoas- 
sociative networks with threshold-linear units. Only with a binary coding scheme we could find a 
limited region of the parameter space in which either 2-mixtures or 3-mixtures are stable attractors 
of the dynamics. 



I. INTRODUCTION 

Autoassociative networks are useful models of one of 
the basic operations of cortical networks [1]. 'Hebbian' 
plasticity on recurrent connections, e.g. in the higher 
level areas of sensory cortex and in the hippocampus, is 
the crucial ingredient for autoassociation to work, with 
real neurons [2]. Neural network models, although very 
simplified and abstract, allow a comprehensive analy- 
sis, indicating whether associative memory retrieval can 
proceed safely, or whether it must face dynamical hur- 
dles, such as 'spurious' local minima in a free-energy 
landscape. The dynamics of such networks, in the sim- 
plest models, is governed by a number p of dynamical 
attractors, each of which corresponds to a distribution 
of neural activity, i.e. a pattern, which represents a long 
term memory. Memory is stored by superimposed synap- 
tic weight changes, and the basic operation proceeds by 
supplying the network with an external signal that acts 
as a cue, correlated, perhaps only weakly, with a pat- 
tern, and which leads through attractor dynamics to the 
retrieval of the full pattern. 

How smoothly can such an operation proceed, and 
how wide are the basins of attraction of the p memory 
states? Clearly, these issues depend critically on whether 
other attractors exist, that could hinder or obstruct re- 
trieval As a crude example, if the cue is correlated with 
the image of a mule, the net may be able to retrieve 
either a horse or a donkey, if no "mixed" attractor ex- 
ist. If instead the encoding procedure has, unintention- 
ally, created a spurious attractor for the mule itself, the 
network will likely be stuck in such a mixed memory 
state. In a slightly more complicated model endowed 
with some topographic mapping of visual space, a horse 
cue and a donkey cue might be presented simultaneously 
in neighbouring positions. If they are too close in visual 
space and spurious attractors exist, this topographic map 
might retrieve two mules next to each other. Returning 
to nets without spatial structure and considering for sim- 
plicity only symmetric mixtures of patterns embedded 
with equal strengths, there are obviously p(p — l)/2 2- 
mixtures, p(p — l)(p — 2)/6 3-mixtures, and so on. Do 
they correspond to stable attractors, and as such do they 
influence the network dynamics? 

In addition, connectionist modelers have proposed to 



describe in terms of spurious states certain psychiatric 
dysfunctions [3]. Speech disorders in schizophrenic pa- 
tients, for instance, might arise from the existence of a 
large number of spurious states, that obstruct the re- 
trieval of correct patterns [4] . 

In their seminal investigation of the Hopfield model [5] , 
Amit, Gutfrcund and Sompolinsky found that while sym- 
metric mixtures of an even number of patterns are unsta- 
ble, odd mixtures and the spin glass phase can be stable, 
in a certain region of phase space [6]. In the Hopfield 
model, though, neurons are modelled as binary units, 
and correspondingly each distribution of activity, in par- 
ticular each memory pattern, is a binary vector. Either 
or both of these aspects might be essential in produc- 
ing the additional minima in the free-energy landscape. 
Real neurons behave very differently from binary units 
in many respects, a basic one being that their spiking 
activity, once filtered with a short time-kernel [7] , is bet- 
ter approximated by an analog variable. Threshold- linear 
units reproduce this graded nature of neural response, yet 
still allow for a simple and complete statistical mechan- 
ics analysis of autoassociative network models [8] . With 
threshold-linear units, the memory patterns encoded in 
the synaptic weights can still be taken to be binary vec- 
tors, but can also be taken to be drawn from a distri- 
bution with several discrete activity values, or from a 
continuous distribution [2]. Exponential distributions, in 
particular, can be argued to be not far from experimen- 
tally observed spike count distributions [9]. 

The question of mixture states in analog nets was first 
addressed in [10], arguing that the multiple local minima 
of the spin glass phase are fewer in number in an associa- 
tive net of units with more continuous (sigmoid) transfer 
function. Later it was found, considering threshold-linear 
units, that are both realistic and amenable to analyti- 
cal treatment [2] , that the region of stability of the spin 
glass phase is severely restricted with such units [11], 
again indicative of a general smoothing of the free-energy 
landscape with analog variables. Although these analy- 
ses provide a good starting point, they are not complete 
in the sense that they did not show what will happen to 
n-mixture states with n small (the ones relevant to mod- 
els of schizophrenia), and what is the effect of different 
coding schemes, that is pattern distributions. Here, we 
consider instead symmetric n-mixtures, with n = 2, 3, . . ., 
and we consider non-binary memory vectors. Also, from 
the biological point of view, it is important to study nets 



1 



with diluted (incomplete) connectivity, which are much 
more realistic descriptions of cortical [12] and hippocam- 
pal networks [13], where the probabiliy of a recurrent 
connection between any two units may be of the order of 
a few percent. 

In this manuscript we show that symmetric mix- 
ture states give rise to dynamical attractors only in 
very restricted circumstances, in associative networks of 
threshold-linear units, both with full and diluted connec- 
tivity. We have analysed the validity of this statement 
in different coding schemes, and did not find any stable 
mixture state at all, when memory patterns are not bi- 
nary. Essentially, we conclude that this type of spurious 
states are a pathological feature of the simplified binary 
models considered in the initial studies. 



II. THRESHOLD LINEAR MODEL 

We use a model very similar to that analysed in [8]. 
We consider a fully connected network of N units, taken 
to model excitatory neurons. The level of activity of unit 
i is a dynamical variable Vi > 0, which corresponds to 
the short time averaged firing rate of the neuron. Units 
are connected to each other through symmetric weights. 
The specific covariance 'Hebbian' learning rule we con- 
sider prescribes that the synaptic weight between units i 
and j be given as 
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where rf£ represents the activity of unit i in pattern /i. 
Each 7]? is taken to be a quenched variable, drawn inde- 
pendently from a distribution p{rj), with the constraints 
V > 0, {rj)^ = (r/ 2 ) ri = a. As in one of the first extensions 
of the Hopfield model [14], we thus allow for the mean 
activity a of the patterns to differ from the value a — 1/2 
of the original model [8] . 

The model further assumes that the input to unit i 
takes the form 
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where T t hr is a threshold below which the input elicits no 
output, g is a gain parameter, and the Heaviside 

step function. Units are updated, for example, sequen- 
tially in random order, possibly subject to fast noise. 
The exact details of the updating rule and of the noise 
are not specified further, here, because they do not af- 
fect the steady states of the dynamics, and we take the 
noise level T to be vanishingly small, T — > 0. Discus- 
sions about the biological plausibility of this model for 
networks of pyramidal cells can be found in [2,15], and 
will not be repeated here. 

Subject to the above dynamics, the network evolves 
towards one of a set of attractor states. In a given at- 
tractor the network may still wander among a variety 
of configurations, but it reaches a stationary probability 
distribution of being in any particular configuration. The 
average of any quantity over such 'annealed' probability 
distribution is denoted by () (whereas ()„ denotes the 
average over the quenched distribution p{rf) ). To anal- 
yse such a model one can introduce, as in [8] the order 
parameters: 
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where x is simply the mean activity of the network, and 
x a ,the subtracted, or specific, overlap of the current state 
of the network with each of the stored patterns. Two fur- 
ther parameters, 



V> = (t/o-yi)T /T 

pyi 

p - [Jv(i-V)T 



(8) 
(9) 



where the first term enables the memories encoded in the 
weights to determine the dynamics; the second term al- 
lows for external signals s v to cue the retrieval of one 
or several patterns; and the third term is unrelated to 
the memory patterns, but is designed to regulate the ac- 
tivity of the newtork, so that at any moment in time 
lv Si Vi = jj J2i Vi = a- The activity of each unit is 
determined by its input through a threshold-linear func- 
tion 



V i =g(h i -T thr )Q{h i -T thr ) 



(3) 



can be defined as a function of y and yi, and play a 
particularly useful role in the analysis in the limit we 
consider, T — > 0, when one configuration dominates 
the annealed average, and y\ ~ y + 0(T). The char- 
acteristic noise scale of the system is T = (1 — a) /a 
[8], and we define the storage load a = p/N. In the 
limit N — > 0, T — > 0, the system is thus characterized 
by the parameters a (mean pattern activity, which also 
parametrizes the coding sparseness [8] in the sense that 
decreasing a makes the code sparser), a (storage load), 
g (gain) and T t hr (threshold). 
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III. MEAN FIELD SOLUTIONS AND THEIR 
STABILITY 

We calculate the free energy using the replica trick, for 
symmetric n-mixture states (where n overlaps take the 
same non-zero value, and the rest are zero) elicited by 
external signals s 1 = ... = s n = s. These signals can 
be purely transient, so that at steady state s — 0, but 
we consider a non-zero steady value for the sake of gen- 
erality. We look for symmetric states, characterized by 
non-zero x 1 = . . . = x n = x. The saddle point equations 
reduce to [8]: 



x = g' < / Dz(h - Tthr) > 

Jh>T t hr 



(10) 



x a = g' « (— - 1) / Dz(h - T thr ) » (11) 

a Jh>T t hr 

il) = T g'<. [ Dz^> (12) 

Jh>T t hr 

yo = [g'f « / Dz(h - T thr ) 2 » (13) 

Jh>T t hr 

p = jtw m = w^fy 9= ^w> (14) 

where now the input to each unit can be expresed as: 



h = b(x)-j2 xa ( x<T + s<T )- zT °p ( 15 ) 

a a 

and the free energy reads: 

Dz(h - T thr ) 2 » +\ ]T(x-) 2 

1 Jh>T t hr 1 a 

+ xb(x) - B(x) + ^p 2 

If one defines new parameters v = (x + s)/(Top) (the 
specific signal-to-noise ratio) and w — [b(x) — nx — 
Tthr] I '(Top) (a sort of uniform field-to-noise ratio), it is 
easy to show that the mean field equations can be re- 
duced to : 



E 1 (w, v) = (A 1 + SA 2 ) 2 - aA 3 = 



E 2 {w,v) = (A 1 + 6A 2 )( 



where 
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A 2 ) -aA 2 =0 (17) 



A 1 {w,v)=A 2 (w,v)- {( Dz) v (18) 

A 2 (w,v) = -^-((--n) [ Dz(w + v--z)) v (19) 
nvlo a J a 



/+ y 
Dz(w + v-- zf) v , 



(20) 



with T = X^"=i ^^and 5 — s/x. In the equations above, 
Dz = -j§^e~ z I 2 and the subscript + indicates that the 



^-average has to be carried out only in the range where 
w + — z > 0. In the following we take 6 = 0. Thus 
symmetric n- mixture attractors exist if we can find stable 
solutions of Eqs.16, 17. 

To analyze the stability of the extrema of the free en- 
ergy, one has to study the hessian matrix 



H^ = S^-((^-l)(^-l)J + Dz) v 



(21) 



around the saddle point. 

In general, for n-mixture states, there are three types 
of eigenvalues: 

1. a non-degenerate eigenvalue, which decides the sta- 
bility against a uniform increase in the amplitude of the 
n patterns that contribute to the thermodynamic state 
(i.e. the 'condensed' patterns), while the other overlaps 
remain zero. It is (for p ^ v) 

Ai = 1 - {((?- 1) + nC- -l){\- 1)) / Dz), 

(22) 

2. an eigenvalue of degeneracy n — 1, associated with 
any direction which tends to change the relative ampli- 
tude of the non-zero overlaps. It is (again for p ^ v) 



A 2 = l-(((^-l) 2 -(^-l)(^- 'in / Dz},, 



(23) 



3. the third eigenvalue, with degeneracy p — n, mea- 
sures the stability against the appearance of additional 
overlaps. 



A, 



(f Dz),. 



(24) 



IV. DIFFERENT CODING SCHEMES 

In order to proceed further, we restrict the analysis 
to a number of specific coding schemes, i.e., to different 
choices for the distribution p(r]). We consider 



p(rj) = ad(r/ — 1) + (1 — a)S(r/), binary 



a 5(rj — ^) + aS(i] — i) + (1 — ^)S(i]), ternary 



3 w 2' w 2' v 3 
p(i]) = Aae~ 211 + (1 - 2a)S(r]), exponential. (25) 

For small values of the load a (and hence of the 
quenched noise p), Eq.17 describes an hyperbole, whose 
center depends on the value of g. Eq.16 instead, for 
small values of a, is a closed curve in the quartant 
w < 0,f > 0, so that with an appropriate choice of g 
the two curves intersect at two points. As a grows, the 
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region E\{v,w) > shrinks in size, until at a certain 
value of a, which depends only on a, n and the coding 
scheme, it reduces to a point and then disappears. 

We have investigated not just the existence but also 
the stability of solutions for symmetric 2- and 3-mixture 
states. The solutions behave exactly in the same manner 
in these two cases: for small values of a both intersections 
discussed above are unstable, in the sense that both Ai 
and A2 are negative. This finding is confirmed by com- 
puter simulation, in which one of the overlaps tends to 
grow, reaching the corresponding attractor, whereas the 
other one (or the other two in the case of 3-mixtures) 
tend to zero. Increasing the value of the sparsity param- 
eter, one finds different results with binary coding and 
with other types of coding. 
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FIG. 1. Computer simulation result N = 10000, p = 5 and 
(a,b): g = 1.2, a — 0.4; (c,d): g = 3, a = 0.6. In (a,c) the 
initial state was correlated equally with 2 patterns, in (b,d) 
with 3. 



different situation occurring with ternary and binary cod- 
ing, by considering a very low load and a sparsity value 
for which stable solutions for 3-mixtures are easily found 
in the binary case. Note that the 'critical load' for 3- 
mixtures would be considerably higher with ternary pat- 
terns (not shown); the fact is that at each position of 
the intersection, either Ai or A2 or both turn out to be 
negative. This complex behaviour of eigenvalues will be 
discussed elsewhere in more detail. 




FIG. 2. Storage capacity as a function of sparseness for 
the single pattern states (full line), compared to the region of 
existence and stability of 2-mixtures (dashed line) and 3-mix- 
tures (dashed dotted line), for the binary coding scheme, in 
a fully connected network with threshold-linear units. Points 
denote the a, a values used in the simulations of Figure 1. 
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Let us consider binary coding first. After a range of 
a- values with only one unstable eigenvalues (Ai or A2), 
one finds a range where genuinely stable solutions can be 
found. Thus the retrieval of mixture patterns is possi- 
ble for binary coding, as can be seen in the simulations 
shown in Fig.l. 

The exact stability region in the (a, a) plane differs for 
2-mixtures and 3-mixtures. In both cases, it is delimited 
to the right by the 'critical load' a c (a, n), i.e. the value at 
which the island with Ei(v,w) > shrinks to zero, and 
to the left by the load a beyond which no intersection 
with both Ai > and A2 > can be found. Fig. 2 illus- 
trates these stability regions, compared with the critical 
load for the pure attractor states, as in [8]. 

For ternary and exponential coding, the solutions of 
the saddle point equations remain unstable even for very 
high values of the sparsity parameter a. Again, this was 
verified by computer simulations. Fig. 3 illustrates the 



FIG. 3. The final overlaps with each of 4 stored patterns, 
averaged over the last 400 updates of the whole network, for 
left: binary and right: ternary coding, in both cases with 
N=10000, a=0.5, p=4. 3 patterns have initially a non-zero 
overlap with the activity of the network and retain it, for 
most g values, in the binary coding case, while a single pat- 
tern is always selected in the ternary case. 



V. DILUTED CASE 

We have also extended the analysis to a highly diluted 
network [16]. In this case the number of patterns that 
can be stored scales with the number C of connections 
each unit receives, rather than with the number of units 
N. One then redefines the load parameter as a = p/C. 
The essential difference introduced by the sparse (i.e. di- 
luted) connectivity is that noise has less of an opportu- 
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nity to reverberate along closed loops. In fact the signal, 
which during retrieval is simply contributed by the 'con- 
densed' patterns, propagates coherently and proportion- 
ally to C, independently of the density of feedback loops 
in the network. The fluctuations in the overlaps with the 
undecondensed patterns, which as T — > represent the 
sole source of noise, propagate coherently along feedback 
loops, giving rise to the amplifying factor 1/(1 — ip) of 
the fully connected case. For a given load (fixed a), di- 
luted connectivity reduces therefore the influence of this 
'static' noise, and performance is better than in the fully 
connected case with N — 1 = C. In particular with the 
extreme dilution, that is, if the condition Ln C ( N ^ — ► is 
satisfied, one can neglect correlations among the C in- 
puts to a given unit [16], and the mean field equations 
become [17]: 



E x (w, v) = (A 2 + SA 2 ) 2 - aA 3 = 

S2 ^ = Wr*r A2)=0 - 



(26) 
(27) 



Examining again the stability matrix, we find that the 
mixture solutions, that were present with binary coding 
and large values of a, still survive. By the token, the re- 
sults for ternary and exponential coding are not affected, 
in the sense that no stable solutions can be found even 
in the highly diluted case. 



VI. CONCLUSION 

The conclusion is that the existence of stable mixture 
states in a restricted region of the parameter space should 
be regarded as almost a pathological feature, resulting 
from binary coding. If one considers mixture states as 
spurious states, to be avoided, then one notes that the 
introduction of analog variables, a more realistic descrip- 
tion of neural activity, goes a long way towards dispos- 
ing of spurious states, just as it almost eliminated the 
spin glass phase [11]. The remaining region of stability 
of spurious states is definitely eliminated by non-binary 
coding schemes, that further contribute to smooth the 
free-energy landscape. This result casts doubts upon e.g. 
models of schizophrenia that are based on the existence 
of spurious attractors. 



These results may well have implications in domains 
outside computational neuroscience. The smoothness of 
the free-energy landscape is a crucial features of many 
interacting systems used to map optimization problems, 
such as the travelling salesman [18] or the graph matching 
problem [19]. Optimization generally fails if the dynam- 
ics gets stuck into local minima. Our result indicates that 
undesired local minima may be eliminated by a combina- 
tion of analog variables and coding schemes, which may 
in some cases be manipulated while mapping the problem 
at hand onto a dynamical system. 
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